1980: Analysis of Categorical Data in Weighted Cluster Sample Surveys

نویسندگان

  • Jai W. Choi
  • Richard B. McHugh
چکیده

A weighted cluster sample survey design is frequently used in large demographic sample surveys. In the National Health Interview Survey conducted by the National Center for Health Statistics, households are often selected in clusters of four. In this survey, sociodemographic and health characteristics of all members of sample households are recorded. Such characteristics for each person interviewed are multiplied by a known weight that is approximately the inverse of the probability of being included in the sample on the basis of the post-stratified geographic and demographic domain of each individual This type of weighting is necessary to estimate certain characteristics of the target population at reasonable cost in large sample survey situations. Cohen (1976) ~iscussed the distribution of the chi-squared statistic from contingency tables in cluster sampling when clusters consist of two members. Altham (1976) generalized Cohen's results for clusters of M members. In the present research, these results are further extended to the weighted cluster sample survey. A new chi-squared statistic is used to analyze data from cluster sampling and weighted cluster sampling, and these two results are compared. This statistic is useful in the analysis of complex survey data for investigating the effect of weighting in ••cluster sample survey situations. Illustrative data from the 1975 National Health Interview Survey are analyzed by these new methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure

K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency...

متن کامل

Weighted delta factor cluster ensemble algorithm for categorical data clustering in data mining

Though many cluster ensemble approaches came forward as a potential and dominant method for enhancing the robustness, stability and the quality of individual clustering systems, it is intensely observed that this approach in most cases generate a final data partition with deficient information. The primary ensemble information matrix generated in the traditional cluster ensemble approaches resu...

متن کامل

Wheat and barley seed system in Syria: How diverse are wheat and barley varieties and landraces from farmer’s fields?

"> The present study described the diversity of wheat and barley varieties andlandraces available in farmer’s fields in Syria using different indicators. Analysisof spatial and temporal diversity and coefficient of parentage along withmeasurements of agronomic and morphological traits were employed to explain thediversity of wheat and barley varieties or landraces grown by farmers in Syria.Farm...

متن کامل

Incremental entropy-based clustering on categorical data streams with concept drift

Clustering on categorical data streams is a relatively new field that has not received as much attention as static data and numerical data streams. One of the main difficulties in categorical data analysis is lacking in an appropriate way to define the similarity or dissimilarity measure on data. In this paper, we propose three dissimilarity measures: a point-cluster dissimilarity measure (base...

متن کامل

Clustering From Categorical Data Sequences

The three-parameter cluster model is a combinatorial stochastic process that generates categorical response sequences by randomly perturbing a fixed clustering parameter. This clear relationship between the observed data and the underlying clustering is particularly attractive in cluster analysis, in which supervised learning is a common goal and missing data is a familiar issue. The model is w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002